Skip to main content

Linear Algebra

Basis Space and Basis Vectorsโ€‹

Imagine these in 2D as 'tiling' a vector space. Imagine making a grid with those long pieces from Erector Sets. You can shear/smush them only at certain angles. Now imagine that the long pieces can only stretch or shrink lengthwise. That's kinda what these are.

Now "applying" a Matrix means that you change this Vector Space and all the vectors you've embedded in it in some way. You shrink it, stretch it, rotate it by some angle, flip it inside-out, or just leave it alone! In some cases, you can even change your mind and smash an undo button called "Commutativity".

Vector Normsโ€‹

These functions give you an idea of the 'size' or 'length' of a vector. There are three kinds:

  1. Euclidean โˆฅaโˆฅ2=a12+a22+โ‹ฏ+an2\|a\|_2 = \sqrt{a_1^2 + a_2^2 + \cdots + a_n^2}
  2. Manhattan โˆฅaโˆฅ1=โˆฃa1โˆฃ+โˆฃa2โˆฃ+โ‹ฏ+โˆฃanโˆฃ\|a\|_1 = |a_1| + |a_2| + \cdots + |a_n|
  3. Infinity โˆฅaโˆฅโˆž=maxโกiโˆฃaiโˆฃ\|a\|_\infty = \max_i |a_i|
    AKA Fuck It I'm Tired Norm

Eigenvalues and Eigenvectorsโ€‹

TODO: Finish this.

Dot and Cross Productsโ€‹

Vectors Only Please

Note that Dot and Cross Products are only defined for Vectors. I mean there are things like the Kronecker Product but that's not what we're dealing with here.

Dot Productsโ€‹

These are easy-peasy and tell you about how well two vectors vibe with each other. The result is a number. Consider two vectors with the same size. If a,bโˆˆRn\bold{a},\bold{b} \in \mathbb{R}^n

a.b=โˆฃโˆฃaโˆฃโˆฃย โˆฃโˆฃbโˆฃโˆฃย cosฮธ=โˆ‘k=1nakbk\bold{a}.{\bold{b}} = ||\bold{a}||\space||\bold{b}||\space cos\theta = \sum_{k=1}^n{a_kb_k}

That's about it. If you get a zero, they're orthogonal (at 90o90^o in 2D space). That Cosine is a good similarity measure that's used in all manner of Machine Learning algos like LLMs. E.g. Recall that Cos(90o)=0Cos(90^o) = 0, which you can take to mean that they're not similar at all.

Cross Productsโ€‹

These work in 3D for the most part and will give you a new vector that is orthogonal/perpendicular to the plane of the two input vectors (which are 3D!) I've never used them for anything. Read this for more.

Matrix Rankโ€‹

This is an easy concept but is pretty important downstream. It's the number of linearly independent rows or columns of a matrix.

When you do you pick rows versus columns? The smallest of the two: if you have a 'rectangular' mร—nm \times n matrix (always rows ร—\times columns), rankโ‰คmin(m,n)rank \leq min(m, n).

A "Full Rank" matrix is one where there are no linearly dependent (not independent!) rows or columns (whichever is smallest). So if you have a matrix that's 4 rows and 3 columns, the maximum rank possible is 3. Now look at the columns and see if you can figure out if one column depends on the other. Didn't find any? Awesome, you have a Full Rank matrix.

Found one that depends on the other? Your rank is 2. Found two? Rank 1. See this Wikipedia article on Row Echelon Forms for more.

Identity Matrixโ€‹

A nice simple square matrix that looks like this.

In=[100โ‹ฏ0010โ‹ฏ0001โ‹ฏ0โ‹ฎโ‹ฎโ‹ฎโ‹ฑโ‹ฎ000โ‹ฏ1]I_n = \begin{bmatrix} 1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \end{bmatrix}

Determinantsโ€‹

This gives you a scalar (boring-ass number) from your matrix. This number tells you how much applying the matrix will transform the magnitude and direction of an area (2D) or volume (3D+) of a space. If the number's negative you get a mirror image.

TODO: More here...

Commutativityโ€‹

In general, ABโ‰ BAAB \ne BA. You can verify this yourself with two 2ร—22 \times 2 matrices. But there are cases where this holds:

  • AI=IA=AAI = IA = A
  • A0=0A=AA0 = 0A = A
  • If B=ฮปIB = \lambda I for some scalar ฮป\lambda then AB=ฮปA=BAAB = \lambda A = BA (i.e. you can scale the Identity Matrix all you want)
  • Some diagonal matrices...

Invertible Matrixโ€‹

This is a square matrix AโˆˆRnร—nA \in \mathbb{R}^{n \times n} that has some other square matrix BโˆˆRnร—nB \in \mathbb{R}^{n \times n} such that

AB=BA=InAB = BA = I_n

This other square matrix BB is the Inverse of AA and is denoted Aโˆ’1A^{-1}. It has some properties:

  1. Its Determinant is not zero.
  2. It has a Full Rank
  3. If xx is some vector (xโˆˆRnx \in \mathbb{R}^n), Ax=0Ax = 0 has only one solution: xx is full of zeroes!
  4. If bb is some vector (bโˆˆRnb \in \mathbb{R}^n), Ax=bAx = b has just one solution x=Aโˆ’1bx = A^{-1}b

Singular Matrixโ€‹

This is a square matrix (nร—nn \times n) where

  1. The Determinant is Zero
  2. It's not Full Rank
  3. There's some non-zero vector xx such that Ax=0Ax = 0
  4. It is not invertible!

These things smush a vector space into lower dimensions. Well really they create a mapping to a lower space (the original is preserved) but yeah.

Transposesโ€‹

Transposes are when you turn a matrix AA's rows into columns and vice-versa and denote the monstrosity ATA^T. They're just a different kind of transformation and are useful depending on the problem you're trying to solve. They have some properties.

  • (AT)T=A(A^T)^T = A
  • (AB)T=BTAT(AB)^T = B^TA^T
  • (A+B)T=AT+BT(A+B)^T = A^T + B^T
  • det(AT)=det(A)det(A^T) = det(A)
  • (ฮฑA)T=ฮฑAT(\alpha A)^T = \alpha A^T
  • (Aโˆ’1)T=(AT)โˆ’1(A^{-1})^T = (A^T)^{-1}

Miscellaneousโ€‹

Other Types of Matricesโ€‹

  • An Orthogonal matrix is one where AT=Aโˆ’1A^T = A^{-1}
  • A Symmetric matrix is one where A=ATA = A^T
  • A Conjugate matrix just flips the sign of the imaginary part of any complex numbers in a matrix.
  • A Hermitian matrix is when a matrix equals its Conjugate Transpose: A=Aห‰TA = \bar{A}^T
    Pretty important in ML and Quantum Mechanics
  • TODO: Conjugate and Adjoint matrices...

Cramer's Ruleโ€‹

Easier shown with an example. Heaven forbid you compute things by hand these days...

Solveย Ax=bย withย A=[21โˆ’1โˆ’3โˆ’12โˆ’212],x=[xyz],b=[8โˆ’11โˆ’3].\textbf{Solve } A\mathbf{x}=\mathbf{b}\text{ with } A=\begin{bmatrix} 2 & 1 & -1\\ -3 & -1 & 2\\ -2 & 1 & 2 \end{bmatrix},\quad \mathbf{x}=\begin{bmatrix}x\\y\\z\end{bmatrix},\quad \mathbf{b}=\begin{bmatrix}8\\-11\\-3\end{bmatrix}. detโก(A)=โˆ’1.\det(A) = -1.

Replace the ii-th column of AA by b\mathbf{b} to get AiA_i:

A1=[81โˆ’1โˆ’11โˆ’12โˆ’312],A2=[28โˆ’1โˆ’3โˆ’112โˆ’2โˆ’32],A3=[218โˆ’3โˆ’1โˆ’11โˆ’21โˆ’3].A_1= \begin{bmatrix} 8 & 1 & -1\\ -11 & -1 & 2\\ -3 & 1 & 2 \end{bmatrix},\quad A_2= \begin{bmatrix} 2 & 8 & -1\\ -3 & -11 & 2\\ -2 & -3 & 2 \end{bmatrix},\quad A_3= \begin{bmatrix} 2 & 1 & 8\\ -3 & -1 & -11\\ -2 & 1 & -3 \end{bmatrix}. detโก(A1)=โˆ’2,detโก(A2)=โˆ’3,detโก(A3)=1.\det(A_1)=-2,\qquad \det(A_2)=-3,\qquad \det(A_3)=1.

By Cramerโ€™s Rule,

x=detโก(A1)detโก(A)=โˆ’2โˆ’1=2,y=detโก(A2)detโก(A)=โˆ’3โˆ’1=3,z=detโก(A3)detโก(A)=1โˆ’1=โˆ’1.x=\frac{\det(A_1)}{\det(A)}=\frac{-2}{-1}=2,\qquad y=\frac{\det(A_2)}{\det(A)}=\frac{-3}{-1}=3,\qquad z=\frac{\det(A_3)}{\det(A)}=\frac{1}{-1}=-1. (x,y,z)=(2,โ€‰3,โ€‰โˆ’1).\boxed{(x,y,z)=(2,\,3,\,-1).}